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The listing of claims will replace all prior versions, and listings, of claims in the 
application: 

Listing of Claims: 

1. (Original) A voice data selector, comprising: 

memory means for storing a plurality of voice data expressing voice waveforms; 

search means for inputting text information expressing a text and retrieving voice 
data expressing a waveform of a voice unit whose reading is common to that of a voice 
unit which constitutes the text from among the voice data; and 

selection means for selecting each one of voice data corresponding to each 
voice unit which constitutes the text from among the searched voice data so that a value 
obtained by totaling difference of pitches in boundaries of adjacent voice units in the 
whole text may become minimum. 

2. (Original) The voice data selector according to claim 1 , further comprising: 
speech synthesis means of generating data expressing synthetic speech by 

combining selected voice data mutually. 

3. (Original) A voice data selection method, the method comprising the steps of: 
storing a plurality of voice data expressing voice waveforms; 

inputting text information expressing a text, retrieving voice data expressing a 
waveform of a voice unit whose reading is common to that of a voice unit which 
constitutes the text from among the voice data; and 

selecting each one of voice data corresponding to each voice unit which 
constitutes the text from among the retrieved voice data so that a value obtained by 
totaling difference of pitches in boundaries of adjacent voice units in the whole text may 
become minimum. 
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4. (Original) A program for causing a computer to function as: 

memory means for storing a plurality of voice data expressing voice waveforms; 

search means for inputting text information expressing a text and retrieving voice 
data expressing a waveform of a voice unit whose reading is common to that of a voice 
unit which constitutes the text from among the voice data; and 

selection means for selecting each one of voice data corresponding to each 
voice unit which constitutes the text from among the searched voice data so that a value 
obtained by totaling difference of pitches in boundaries of adjacent voice units in the 
whole text may become minimum. 

5. (Currently Amended) A voice selector, comprising: 

memory means for storing a plurality of voice data expressing voice waveforms; 

prediction means for predicting time series change of pitch of a voice unit by 
inputting text information expressing a text and performing cadence prediction for a 
voice unit which constitutes the text concerned; and 

selection means for selest selecting from among the voice data the voice data 
which expresses a waveform of a voice unit whose reading is common to that of a voice 
unit which constitutes the text, and whose time series change of pitch has the highest 
correlation with prediction fesutt results by the prediction means. 

6. (Currently Amended) The voice selector according to claim 5, wherein the 
selection means may specify strength of correlation between time series change of pitch 
of voice data, and r e sult results of prediction by the prediction means on the basis of 
r e su l t results of regression calculation which performs primary regression between time 
series change of pitch of a voice unit which voice data expresses, and time series 
change of pitch of a voice unit in the text whose reading is common to the voice unit 
concerned. 
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7. (Currently Amended) The voice selector according to claim 5, wherein the 
selection means may specify strength of correlation between time series change of pitch 
of voice data, and r e sult results of prediction by the prediction means on the basis of a 
correlation coefficient between time series change of pitch of a voice unit which voice 
data expresses, and time series change of pitch of a voice unit in the text whose 
reading is common to the voice unit concerned. 

8. (Currently Amended) A voice selector, comprising: 

memory means for storing a plurality of voice data expressing voice waveforms; 

prediction means for predicting time length voice unit and time series change of 
pitch of the voice unit concerned by inputting text information expressing a text and 
performing cadence prediction for the voice unit in the text concerned; and 

selection means for specifying an evaluation value of each voice data expressing 
a waveform of a voice unit whose reading is common to a voice unit in the text and 
selecting voice data whose evaluation value expresses the highest evaluation, and in 
that the evaluation value is obtained from a function of a numerical value which 
expresses correlation between time series change of pitch of a voice unit which voice 
data expresses, and prediction r e su l t results of time series change of pitch of a voice 
unit in the text whose reading is common to the voice unit concerned, and a function of 
difference between prediction r e sult results of time length of a voice unit which the voice 
data concerned expresses, and time length of a voice unit in the text whose reading is 
common to the voice unit concerned. 

9. (Original) The voice selector according to claim 8, wherein the numerical 
value expressing correlation comprises a gradient of a primary function obtained by the 
primary regression between time series change of pitch of a voice unit which voice data 
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expresses, and time series change of pitch of a voice unit in the text whose reading is 
common to that of the voice unit concerned. 

10. (Original) The voice selector according to claim 8, wherein the numerical 
value expressing correlation comprises an intercept of a primary function obtained by 
the primary regression between time series change of pitch of a voice unit which voice 
data expresses, and time series change of pitch of a voice unit in the text whose 
reading is common to that of the voice unit concerned. 

1 1 . (Currently Amended) The voice selector according to claim 8, wherein the 
numerical value expressing correlation comprises a correlation coefficient between time 
series change of pitch of a voice unit which voice data expresses, and prediction r e su l t 
results of time series change of pitch of a voice unit in the text whose reading is 
common to that of the voice unit concerned. 

12. (Currently Amended) The voice selector according to claim 8, wherein the 
numerical value expressing correlation comprises the maximum value of correlation 
coefficients between a function which what is given various bit count cyclic shifts to data 
expressing time series change of pitch of a voice unit which voice data expresses, and 
a function expressing prediction r e su l t results of time series change of pitch of a voice 
unit in the text whose reading is common to that of the voice unit concerned. 

13. (Original) The voice selector according to any one of claims 5 to 12, wherein 
the memory means stores phonetic data expressing reading of voice data with 
associating it with the voice data concerned; and 

wherein the selection means treats voice data, with which phonetic data 
expressing the reading agreeing with the reading of a voice unit in the text is 
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associated, as voice data expressing a waveform of a voice unit whose reading is 
common to the voice unit concerned. 

14. (Currently Amended) The voice selector according to any one of claims 5 to 
[[1 3]] 12, wher ei n further comprising: 

speech synthesis means of generating data expressing synthetic speech by 
combining selected voice data mutually. 

15. (Original) The voice selector according to claim 14, comprising: 

lacked portion synthesis means of synthesizing voice data expressing a 
waveform of a voice unit in regard to the voice unit, on which the selection means was 
not able to select voice data, among voice units in the text without using voice data 
which the memory means stores, and in that the speech synthesis means generates 
data expressing synthetic speech by combining voice data, which the selection means 
selected, with voice data which the lacked portion synthesis means synthesizes. 

16. (Currently Amended) A voice selection method, the method comprising the 
steps of: 

storing a plurality of voice data expressing voice waveforms; 

predicting time series change of pitch of a voice unit by inputting text information 
expressing a text and performing cadence prediction for a voice unit which constitutes 
the text concerned; and 

selecting from among the voice data the voice data which expresses a waveform 
of a voice unit whose reading is common to that of a voice unit which constitutes the 
text, and whose time series change of pitch has the highest correlation with prediction 
resu l t results by the prediction means. 



- 7 - Application Serial No. 10/559,573 

Attorney Docket No. 0670-7063 

17. (Currently Amended) A voice selection method, the method comprising the 
steps of: 

storing a plurality of voice data expressing voice waveforms; 

predicting time length of voice unit and time series change of pitch of the voice 
unit concerned by inputting text information expressing a text and performing cadence 
prediction for a voice unit in the text concerned; and 

specifying an evaluation value of each voice data expressing a waveform of a 
voice unit whose reading is common to a voice unit in the text and selecting voice data 
whose evaluation value expresses the highest evaluation, and in that the evaluation 
value is obtained from a function of a numerical value which expresses correlation 
between time series change of pitch of a voice unit which voice data expresses, and 
prediction r e sult results of time series change of pitch of a voice unit in the text whose 
reading is common to the voice unit concerned, and a function of difference between 
prediction r e su l t results of time length of a voice unit which the voice data concerned 
expresses, and time length of a voice unit in the text whose reading is common to the 
voice unit concerned. 

18. (Currently Amended) A program for causing a computer to function as: 
memory means for storing a plurality of voice data expressing voice waveforms; 
prediction means for predicting time series change of pitch of a voice unit by 

inputting text information expressing a text and performing cadence prediction for a 
voice unit which constitutes the text concerned; and 

selection means for selecting s ele ct from among the voice data voice data which 
expresses a waveform of a voice unit whose reading is common to that of a voice unit 
which constitutes the text, and whose time series change of pitch has the highest 
correlation with prediction fesult results by the prediction means. 



19. (Currently Amended) A program for causing a computer to function as: 
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memory means for storing a plurality of voice data expressing voice waveforms; 

prediction means for predicting time length of a voice unit and time series change 
of pitch of the voice unit concerned by inputting text information expressing a text and 
performing cadence prediction for a voice unit in the text concerned; and 

selection means for specifying an evaluation value of each voice data expressing 
a waveform of a voice unit whose reading is common to a voice unit in the text and 
selecting voice data whose evaluation value expresses the highest evaluation, and in 
that the evaluation value is obtained from a function of a numerical value which 
expresses correlation between time series change of pitch of a voice unit which voice 
data expresses, and prediction r e sult results of time series change of pitch of a voice 
unit in the text whose reading is common to the voice unit concerned, and a function of 
difference between prediction r e sult results of time length of a voice unit which the voice 
data concerned expresses, and time length of a voice unit in the text whose reading is 
common to the voice unit concerned. 

20. (Original) A voice data selector, comprising: 

memory means for storing a plurality of voice data expressing voice waveforms; 

text information input means of inputting text information expressing a text; 

a search section for searching voice data which has a portion whose reading is 
common to that of a voice unit in a text which the text information expresses; and 

selection means for obtaining an evaluation value according to predetermined 
evaluation criteria on the basis of relationship between mutually adjacent voice data 
when each of the searched voice data is connected according to the text which text 
information expresses, and selecting combination of voice data, which is outputted, on 
the basis of the evaluation value concerned. 
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21. (Original) The voice data selector according to claim 20, wherein the 
evaluation criterion is a criterion which determines an evaluation value which shows 
relationship between mutually adjacent voice data; and 

wherein the evaluation value is obtained on the basis of an evaluation expression 
which contains at least any one of a parameter which shows a feature of voice which 
the voice data expresses, a parameter which shows a feature of voice obtained by 
mutually combining voice which the voice data expresses, and a parameter which 
shows a feature relating to speech time length. 

22. (Original) The voice data selector according to claim 20, wherein the 
evaluation criterion is a criterion which determines an evaluation value which shows 
relationship between mutually adjacent voice data; and that the evaluation value 
includes a parameter which shows a feature of voice obtained by mutually combining 
voice which the voice data expresses, and is obtained on the basis of an evaluation 
expression which contains at .least any one of a parameter which shows a feature of 
voice which the voice data expresses, and a parameter which shows a feature relating 
to speech time length. 

23. (Original) The voice data selector according to claim 21 or 22, wherein the 
parameter which shows a feature of voice obtained by mutually combining voice which 
the voice data expresses is obtained on the basis of difference between pitches in a 
boundary of mutually adjacent voice data in the case of selecting at a time one voice 
data corresponding to each voice unit which constitutes the text from among voice data 
which expressing waveforms of voice having a portion whose reading is common to that 
of a voice unit in a text which the text information expresses. 

24. (Currently Amended) The voice data selector according to any one of claims 
20 to [[23]] 22, wherein the evaluation criterion further includes a reference which 
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determines an evaluation value which expresses correlation or difference between 
voice, which voice data expresses, and cadence prediction r e su l t results of the cadence 
prediction means; and that the evaluation value is obtained on the basis of a function of 
a numerical value which expresses correlation between time series change of pitch of a 
voice unit which voice data expresses, and prediction resu l t results of time series 
change of pitch of a voice unit in the text whose reading is common to the voice unit 
concerned, and/or a function of difference between prediction r e sult results of time 
length of a voice unit which the voice data concerned expresses, and time length of a 
voice unit in the text whose reading is common to the voice unit concerned. 

25. (Original) The voice data selector according to claim 24, wherein the 
numerical value expressing correlation comprises a gradient and/or an intercept of a 
primary function obtained by the primary regression between time series change of 
pitch of a voice unit which voice data expresses, and time series change of pitch of a 
voice unit in the text whose reading is common to that of the voice unit concerned. 

26. (Currently Amended) The voice data selector according to claim [[24 or]] 25, 
wherein the numerical value expressing correlation comprises a correlation coefficient 
between time series change of pitch of a voice unit which voice data expresses, and 
prediction r e sult results of time series change of pitch of a voice unit in the text whose 
reading is common to that of the voice unit concerned. 

27. (Currently Amended) The voice data selector according to claim [[24 or]] 25, 
wherein the numerical value expressing correlation comprises the maximum value of 
correlation coefficients between a function which what is given various bit count cyclic 
shifts to data expressing time series change of pitch of a voice unit which voice data 
expresses, and a function expressing prediction r e su l t results of time series change of 
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pitch of a voice unit in the text whose reading is common to that of the voice unit 
concerned. 

28. (Currently Amended) The voice selector according to any one of claims 20 
to [[27]] 22^ wherein the memory means stores phonetic data expressing reading of 
voice data with associating it with the voice data concerned; and 

wherein the selection means treats voice data, with which phonetic data 
expressing reading agreeing, with reading of a voice unit in the text is associated, as 
voice data expressing a waveform of a voice unit whose reading is common to the voice 
unit concerned. 

29. (Currently Amended) The voice selector according to any one of claims 20 
to [[28]] 22, wherein speech synthesis means of generating data expressing synthetic 
speech by combining selected voice data mutually. 

30. (Original) The voice data selector according to claim 29, comprising: 
lacked portion synthesis means for synthesizing voice data expressing a 

waveform of a voice unit in regard to a voice unit, on which the selection means is not 
able to select voice data, among voice units in the text without using voice data which 
the memory means stores, and in that the speech synthesis means generates data 
expressing synthetic speech by combining a voice data, which the selection means 
selects, with voice data which the lacked portion synthesis means synthesizes. 

31. (Original) A voice data selection method, the method comprising the steps 

of: 

storing a plurality of voice data expressing voice waveforms; 
inputting text information expressing a text; 
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searching voice data which has a portion whose reading is common to that of a 
voice unit in a text which the text information expresses; 

obtaining an evaluation value according to predetermined evaluation criteria on 
the basis of relationship between mutually adjacent voice data when each of the 
searched voice data is connected according to a text which text information expresses; 
and 

selecting combination of voice data, which is outputted, on the basis of the 
evaluation value concerned. 

32. (Original) A program for causing a computer to function as: 

memory means for storing a plurality of voice data expressing voice waveforms; 

text information input means for inputting text information expressing a text; 

a search section for searching voice data which has a portion whose reading is 
common to that of a voice unit in a text which the text information expresses; and 

selection means for obtaining an evaluation value according to a predetermined 
evaluation criterion on the basis of relationship between mutually adjacent voice data 
when each of the searched voice data is connected according to a text which text 
information expresses, and selecting combination of voice data, which is outputted, on 
the basis of the evaluation value concerned. 



